Skip to content

20190117 prom rules endpoints #1999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Feb 4, 2020

Conversation

jtlisi
Copy link
Contributor

@jtlisi jtlisi commented Jan 17, 2020

What this PR does:

This PR implements the Prometheus API alerts and rules endpoints in ruler. It adds a GRPC service to the ruler to allow for an implementation that works when rules are sharded across multiple rulers.

Which issue(s) this PR fixes:
Fixes #1720

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md

@jtlisi jtlisi marked this pull request as ready for review January 18, 2020 01:36
@jtlisi jtlisi force-pushed the 20190117_prom_rules_endpoints branch from d198452 to ad00183 Compare January 18, 2020 01:38
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @jtlisi ! It's the first time I'm seeing the ruler internals, so please be patient going through the comments I've left (including some questions).

@jtlisi jtlisi requested a review from pracucci January 22, 2020 20:32
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jacob for addressing my feedback. I've done a second pass review and started doing some manual tests, and I've the feeling this PR is not production-grade yet. I would prefer if, after addressing the comments, you could do extensive manual tests before asking for the next review.

Could you also rebase this PR, please?

pkg/ruler/api.go Outdated

// RegisterRoutes registers the configs API HTTP routes with the provided Router.
func (r *Ruler) RegisterRoutes(router *mux.Router) {
if r.store == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a real case? I can't see how it could happen.

@@ -4,6 +4,7 @@ import (
"context"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small things:

  1. We usually name such files with the _mock suffix (instead of prefix)
  2. In order to have this file compiled only with tests, you could add a _test.go suffix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to change it to store_mock_test.go lol

interval := group.Interval()
groupDesc := &rules.RuleGroupDesc{
Name: group.Name(),
Namespace: group.File(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. Why is it called Namespace if it then maps the File in the API response (see api.go)?
  2. Is it really a good thing exposing the path of the file inside Cortex? Looks we may disclose a sensitive information in a Cortex as a service scenario

Copy link
Contributor Author

@jtlisi jtlisi Jan 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I updated the downstream path to strip the internal cortex file path. The only thing that will be left is the namespace of the users rule group.

The reason Namespace was chosen instead of file is basically to avoid have file be an API object when interacting with Cortex. It's more semantics than anything else, however I think it is good to conceptually differ between the two.

I mention it here: https://docs.google.com/document/d/1gNKiC5L_zWQ4GGgtdhJJ6FnaZXyJJBvdGTUmF0wgEFg/edit?usp=sharing

@jtlisi
Copy link
Contributor Author

jtlisi commented Jan 24, 2020

@pracucci I'm only part way done refactoring this. I'll try to get it ready for another pass by Monday

@jtlisi jtlisi force-pushed the 20190117_prom_rules_endpoints branch from cb310ec to 8ecebe6 Compare January 28, 2020 19:37
@jtlisi
Copy link
Contributor Author

jtlisi commented Jan 28, 2020

@pracucci This should be ready for another look. The scope of this PR broadened a bit while getting the tests set up. The ruler and mapper are now explicitly passed a logger upon instantiation. Also there was some updates to the fixtures to make sure the ruler loaded it's rules before the tests are run.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jtlisi for addressing my feedback. I kindly ask you a bit more patience to take a look at the last comments. There are many small things, except a couple of critical ones (wrong API path and missing return err in initRuler()) 🙏

CHANGELOG.md Outdated
@@ -6,6 +6,7 @@
* [CHANGE] Moved `--store.min-chunk-age` to the Querier config as `--querier.query-store-after`, allowing the store to be skipped during query time if the metrics wouldn't be found. The YAML config option `ingestermaxquerylookback` has been renamed to `query_ingesters_within` to match its CLI flag. #1893
* `--store.min-chunk-age` has been removed
* `--querier.query-store-after` has been added in it's place.
* [FEATURE] Added flag `experimental.ruler.enable-api` to enable the ruler api which implements the Prometheus API `/api/v1/rules` and `/api/v1/alerts` endpoints. #1999
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which implements the Prometheus API /api/v1/rules and /api/v1/alerts endpoints

True, but we export them under /api/prom/.... I would mention that as well, to make it more clear to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make it clear it is prefixed with the configured -http.prefix.

@@ -456,7 +461,7 @@ func (t *Cortex) initAlertmanager(cfg *Config) (err error) {

// TODO this clashed with the queirer and the distributor, so we cannot
// run them in the same process.
t.server.HTTP.PathPrefix("/api/prom").Handler(middleware.AuthenticateUser.Wrap(t.alertmanager))
t.server.HTTP.PathPrefix(cfg.HTTPPrefix).Handler(middleware.AuthenticateUser.Wrap(t.alertmanager))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth to mention it in the changelog as a separate entry? I guess it's an [ENHANCEMENT]. Not sure, just asking.

Copy link
Contributor Author

@jtlisi jtlisi Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure when HTTPPrefix was added. I think the best thing to do would be to create a small PR that ensures everything is configured using the Prefix and adding an [ENHANCEMENT] to the change log clarifying it then. I'll revert the prefix for anything unrelated to the ruler in this PR for now.

pkg/ruler/api.go Outdated
"github.com/cortexproject/cortex/pkg/util"
)

// RegisterRoutes registers the configs API HTTP routes with the provided Router.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

registers the configs API > registers the ruler API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -415,11 +415,16 @@ func (t *Cortex) initRuler(cfg *Config) (err error) {
cfg.Ruler.Ring.ListenPort = cfg.Server.GRPCListenPort
queryable, engine := querier.New(cfg.Querier, t.distributor, t.store)

t.ruler, err = ruler.NewRuler(cfg.Ruler, engine, queryable, t.distributor)
t.ruler, err = ruler.NewRuler(cfg.Ruler, engine, queryable, t.distributor, prometheus.DefaultRegisterer, util.Logger)
if err != nil {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be return err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch


var groups []*promRules.Group
r.userManagerMtx.Lock()
mngr, exists := r.userManagers[userID]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could compact it into the following, which also reduce the scope of variables:

if mngr, exists := r.userManagers[userID]; exists {
    groups = mngr.RuleGroups()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

func (r *Ruler) getLocalRules(userID string) ([]*rules.RuleGroupDesc, error) {
groupDescs := []*rules.RuleGroupDesc{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you move this line after the locked block, you can do groupDescs := make([]*rules.RuleGroupDesc{}, 0, len(groups))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

continue
}
}
r.syncManager(ctx, user, filteredGroups)
}

// Check for deleted users and remove them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: this block hasn't changed, except the logger reference.

}

if update {
level.Info(r.logger).Log("msg", "updating rules", "user", "user")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Info or debug level? Up to you, just wanted to raise it, cause may be a bit verbose for an Info level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an update would be a relatively rare event? Since we have a metric I'm ok with lowering it to debug though.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all, LGTM. I haven't manually tested getShardedRules(), so I do expect you will do it before merging. I left few more nits, but they're all trivial, so no need to re-check it from my side. However, please re-request me a review if something more consistent will change 🙏

State: rl.GetState(),
Name: rl.GetAlert(),
Query: rl.GetExpr(),
Duration: rl.For.Seconds(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as it's not a breaking change, LGTM.

jtlisi added 19 commits February 2, 2020 14:40
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
@jtlisi jtlisi force-pushed the 20190117_prom_rules_endpoints branch from 1009efa to ecf1860 Compare February 2, 2020 22:54
@jtlisi jtlisi merged commit 3d55571 into cortexproject:master Feb 4, 2020
@jtlisi jtlisi deleted the 20190117_prom_rules_endpoints branch January 26, 2021 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ruler does not currently support /api/v1/rules or /api/v1/alerts
2 participants